智能论文笔记

Contributions of Shape, Texture, and Color in Visual Recognition

Yunhao Ge , Yao Xiao , Zhi Xu , Xingrui Wang , Laurent Itti

分类：计算机视觉

2022-07-19

我们研究了人类视觉系统（HVS）〜-〜形状，纹理和颜色〜-〜对对象分类的三个重要特征的贡献。我们构建了人形视觉引擎（HVE），该引擎明确和单独计算图像中的形状，纹理和颜色特征。然后将所得的特征向量连接以支持最终分类。我们表明，HVE可以总结和排序排序对对象识别的三个功能的贡献。我们使用人类实验来确认HVE和人类主要使用一些特定特征来支持特定类别的分类（例如，纹理是将斑马与其他四足动物区分开的主要特征，包括人类和HVE）。借助HVE的帮助，给定任何环境（数据集），我们可以总结整个任务的最重要功能（特定于任务的; （特定于类；为了证明HVE的更有用，我们使用它来模拟没有属性标签的人类的开放世界零射击学习能力。最后，我们表明HVE还可以通过不同特征的组合来模拟人类的想象力。我们将开源HVE引擎和相应的数据集。

translated by 谷歌翻译

How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation

Hejie Cui , Jiaying Lu , Yao Ge , Carl Yang

分类：自然语言处理 | 机器学习

2022-01-12

图形神经网络（GNNS），作为一组强大的表示对不规则数据学习的强大工具，在各种下游任务中表现出优越性。具有表示为概念映射的非结构化文本，可以针对文档检索等任务来利用GNN。呼吸GNNS如何帮助文档检索，我们对大型多学科数据集电源线19进行实证研究。结果表明，我们提出的语义导向图函数的基于BM25检索的候选人，而不是杜松子酒和GAT等复杂的结构导向GNN，而不是杜松子酒和GATS，而不是基于BM25检索到的候选者实现更好且更稳定的性能。我们在本案例研究中的见解可以作为未来工作的指导准则，以便为文档检索和分类等文本推理任务提供适当的语义导向的归纳偏差。此案例研究的所有代码都可以在https://github.com/hennyjie/gnn-docrocrocal中获得。

translated by 谷歌翻译

Encouraging Disentangled and Convex Representation with Controllable Interpolation Regularization

Yunhao Ge , Zhi Xu , Yao Xiao , Gan Xin , Yunkui Pang , Laurent Itti

分类：计算机视觉

2021-12-06

我们专注于可控的解除不应表示学习学习（C-DIS-RL），用户可以控制解剖潜在空间的分区，以将DataSet属性（概念）分解为下游任务。目前的方法仍然探讨了两个普遍的问题：（1）他们缺乏全面的解剖约束，特别是在潜在和观察域之间的不同属性之间的相互信息最小化。（2）他们缺乏解开的潜在空间中的凸起限制，这对于有意义地操纵下游任务的特定属性是重要的。为了同时鼓励全面的C-DIS-RL和凸性，我们提出了一种简单而有效的方法：可控插值正规化（CIR），它创造了一个积极的循环，其中解剖和凸起可以互相帮助。具体而言，我们在训练期间对潜伏空间进行受控插值，并重新利用“编码器”以帮助形成“完美解剖”正规化。在这种情况下，（a）解剖损失隐含地扩大了促使凸起的潜在的“可理解”分配; （b）凸起又可以改善强大和精确的解剖学。 CIR是一般模块，我们将CIR与三种不同的算法合并：优雅，I2I-DIS和GZS-Net，以展示兼容性和有效性。定性和定量实验表明C-DIS-RL和CIR潜在凸起的改善。这进一步改善了下游任务：可控图像合成，跨型图像转换和零射合成。更多实验展示CIR还可以改善其他下游任务，例如新的属性值挖掘，数据增强和消除公平的偏差。

translated by 谷歌翻译

Medical Matting: A New Perspective on Medical Segmentation with Uncertainty

Lin Wang , Lie Ju , Xin Wang , Wanji He , Donghao Zhang , Yelin Huang , Zhiwen Yang , Xuan Yao , Xin Zhao , Xiufen Ye

分类：计算机视觉

2021-06-18

难以通过二进制面具手动准确标记含糊不清的和复杂形状的目标。在医学图像分割中突出显示二元掩模下面的弱点，其中模糊是普遍的。在多个注释的情况下，通过二元面具对临床医生达成共识更具挑战性。此外，这些不确定的区域与病变结构有关，可能含有有利于诊断的解剖信息。然而，目前关于不确定性的研究主要关注模型培训和数据标签的不确定性。他们都没有调查病变本身的模糊性质的影响。通过图像消光，透过图像消光，将Alpha Matte作为软片介绍，代表医学场景中不确定的区域，并因此提出了一种新的不确定性量化方法来填补填补差距病变结构的不确定性研究。在这项工作中，我们在多任务框架中引入了一种新的架构，以在多任务框架中生成二进制掩模和alpha掩饰，这优于所有最先进的消光算法。建议的不确定性地图能够突出模糊地区和我们提出的新型多任务损失加权策略可以进一步提高性能并证明其具体的益处。为了充分评估我们提出的方法的有效性，我们首先用alpha哑布标记了三个医疗数据集，以解决医学场景中可用消光数据集的短缺，并证明alpha遮罩是一种比定性的二进制掩模更有效的标签方法和量化方面。

translated by 谷歌翻译

GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathology Image Classification

Haoyuan Chen , Chen Li , Xiaoyan Li , Ge Wang , Weiming Hu , Yixin Li , Wanli Liu , Changhao Sun , Yudong Yao , Yueyang Teng

分类：计算机视觉

2021-04-29

现有的胃癌诊断深层学习方法，常用卷积神经网络。最近，视觉变压器由于其性能和效率而引起了极大的关注，但其应用主要在计算机视野领域。本文提出了一种用于Gashis变压器的多尺度视觉变压器模型，用于胃组织病理学图像分类（GHIC），其使微观胃图像自动分类为异常和正常情况。 GASHIS-COMPURANCER模型由两个关键模块组成：全球信息模块和局部信息模块有效提取组织病理特征。在我们的实验中，具有280个异常和正常图像的公共血毒素和曙红（H＆E）染色的胃组织病理学数据集分为训练，验证和测试组，比率为1：1：2胃组织病理学数据集测试组精度，召回，F1分数和准确性分别为98.0％，100.0％，96.0％和98.0％。此外，进行了关键的研究以评估Gashis变压器的稳健性，其中添加了10个不同的噪声，包括四种对抗性攻击和六种传统图像噪声。此外，执行临床上有意义的研究以测试Gashis变压器的胃肠癌鉴定性能，具有620个异常图像，精度达到96.8％。最后，进行比较研究以测试在淋巴瘤图像数据集和乳腺癌数据集上的H＆E和免疫组织化学染色图像的概括性，产生可比的F1分数（85.6％和82.8％）和精度（83.9％和89.4％），分别。总之，Gashistransformer演示了高分类性能，并在GHIC任务中显示出其显着潜力。

translated by 谷歌翻译

Learning Task-aware Robust Deep Learning Systems

Keji Han , Yun Li , Xianzhong Long , Yao Ge

分类：机器学习 | 人工智能 | (统计)机器学习

2020-10-11

许多作品表明，深度学习系统容易受到对抗的攻击。深度学习系统由两部分组成：深度学习任务和深层模型。如今，大多数现有作品调查了深度模型对深度学习系统的鲁棒性的影响，忽略了学习任务的影响。在本文中，我们采用二进制和间隔标签编码策略来重新定义分类任务和设计相应的损失，以提高深度学习系统的鲁棒性。我们的方法可以被视为改善学习任务和深层模型的深度学习系统的鲁棒性。实验结果表明，我们的学习任务感知方法比传统分类更强大，同时保留准确性。

translated by 谷歌翻译

Analogical Inference Enhanced Knowledge Graph Embedding

Yao Zhen , Zhang Wen , Chen Mingyang , Huang Yufeng , Yang Yi , Chen Huajun

分类：人工智能 | 自然语言处理

2023-01-03

Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.

translated by 谷歌翻译

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

Jiahao Zhu , Daizong Liu , Pan Zhou , Xing Di , Yu Cheng , Song Yang , Wenzheng Xu , Zichuan Xu , Yao Wan , Lichao Sun

分类：计算机视觉

2023-01-02

Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.

translated by 谷歌翻译

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

Ge Zhang , Yizhi Li , Yaoyao Wu , Linyuan Zhang , Chenghua Lin , Jiayi Geng , Shi Wang , Jie Fu

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-01

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.

translated by 谷歌翻译

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Ruibo Liu , Chenyan Jia , Ge Zhang , Ziyu Zhuang , Tony X Liu , Soroush Vosoughi

分类：自然语言处理 | 人工智能

2023-01-01

We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.

translated by 谷歌翻译